An approximate stochastic optimal control framework to simulate nonlinear neuro-musculoskeletal models in the presence of noise

Optimal control simulations have shown that both musculoskeletal dynamics and physiological noise are important determinants of movement. However, due to the limited efficiency of available computational tools, deterministic simulations of movement focus on accurately modelling the musculoskeletal system while neglecting physiological noise, and stochastic simulations account for noise while simplifying the dynamics. We took advantage of recent approaches where stochastic optimal control problems are approximated using deterministic optimal control problems, which can be solved efficiently using direct collocation. We were thus able to extend predictions of stochastic optimal control as a theory of motor coordination to include muscle coordination and movement patterns emerging from non-linear musculoskeletal dynamics. In stochastic optimal control simulations of human standing balance, we demonstrated that the inclusion of muscle dynamics can predict muscle co-contraction as minimal effort strategy that complements sensorimotor feedback control in the presence of sensory noise. In simulations of reaching, we demonstrated that nonlinear multi-segment musculoskeletal dynamics enables complex perturbed and unperturbed reach trajectories under a variety of task conditions to be predicted. In both behaviors, we demonstrated how interactions between task constraint, sensory noise, and the intrinsic properties of muscle influence optimal muscle coordination patterns, including muscle co-contraction, and the resulting movement trajectories. Our approach enables a true minimum effort solution to be identified as task constraints, such as movement accuracy, can be explicitly imposed, rather than being approximated using penalty terms in the cost function. Our approximate stochastic optimal control framework predicts complex features, not captured by previous simulation approaches, providing a generalizable and valuable tool to study how musculoskeletal dynamics and physiological noise may alter neural control of movement in both healthy and pathological movements.

-A) Optimal mean hand trajectories for the three reaching tasks with 95% confidence ellipses every 10ms during the movement. B) Optimal tangential hand velocity profiles of the mean reaching movement. C) Mean muscle activations of the six different muscles for the different reaching targets. The 'circle' (blue) and 'bar' (red) activations often overlap.

Effect of biarticular muscles and muscle properties
While developing the simulations, we noticed that the arm model and especially muscle properties could largely affect the reach trajectories. We therefore decided to explore how the presence of biarticular muscles and muscle properties affected simulated unperturbed and perturbed reach trajectories and muscle activations. We performed simulations with three models (Figure 2 and Figure  3). Model 1 (Todorov2003 -6 muscles) is the model presented in the main text. Model 2 (Todorov2003 -4 muscles) was derived from model 1 by removing the biarticular muscles. Model 3 (adapted model -4 muscles) also contained only mono-articular muscles and was derived from a more complex musculoskeletal model ( [4], [5]). Model 3 differs from models 1 and 2 in its moment arms, muscletendon lengths and maximal isometric muscle forces (Table 1). In addition, the force-length relationship in model 3 was shifted such that normalized fiber lengths remained within a plausible operating range, i.e. remained below normalized fiber lengths at which huge passive forces would be developed, for the full range of motion for the movement. To this aim, we evaluated the force-length relationship for ̃, ℎ , where ̃, ℎ = 0.5̃+ 0.3.  Removing the biarticular muscles (model 1 → model 2) altered the mean muscle activations but had little effect on the reaching trajectories ( Figure 2). During unperturbed reaches, the kinematic behavior of the models with and without biarticular muscles is strikingly similar (Figure 2, unperturbed reaching). Even in the presence of perturbations, reaching trajectories are very similar for both models for all three reaching tasks ( Figure 2, perturbed reaching).
Altering the muscle-tendon properties (model 2 → model 3) had a large effect on both mean muscle activations and reaching trajectories ( Figure 2 and Figure 3). Reaching trajectories were less variable for model 3 than for model 2, especially for the bar condition ( Figure 2, unperturbed and perturbed reaching). Model 3 predicted later corrections to perturbations resulting in an overshoot of the target for the circle and bar conditions ( Figure 2, perturbed reaching). This overshoot as observed in model 3 was in agreement with experimental data for the bar condition but not for the circle condition.
The implementations of all three models can be found in the supplemented code.

Rigid tendon assumption
In this section we aim to provide more insight in how the rigid tendon assumption might have affected the results. Note that using rigid tendons resulted in algebraic contraction dynamics and simplified the optimal control problems. However, the framework is sufficiently general to consider compliant tendons in future work. Adding a compliant tendon in series to the muscle fibers will reduce the compliance of the muscle-tendon unit. This will have a large influence on muscle-tendon stiffness when the muscle is stiffer than the tendon as in our simulations of standing balance with short-range stiffness, but a small influence on muscle-tendon stiffness when the muscle is more compliant than the tendon as in our simulations of standing balance and reaching with a Hill-type muscle model. Here, we illustrate this for the soleus muscle. We chose the soleus, as it has a longer and thus more compliant tendon than the other muscles we modeled. Therefore, the effect of adding a compliant tendon is expected to be largest for this muscle.
We computed stiffness of the muscle fibers as = cos( ) with the muscle fiber force, the muscle fiber length and the pennation angle; stiffness of the tendon as = with the tendon force, the tendon length. Stiffness of the muscle-tendon unit was computed as: = Note that when assuming a rigid tendon = .
We first discuss the observations for a soleus Hill-type muscle without short-range-stiffness. In the rigid tendon case, the length of both the fiber and tendon is the same for every activation level ( Figure 4A length, dotted lines). In the rigid tendon case, , and thus , increases linearly with muscle activation level as the muscle is operating on the ascending limb of the force-length relationship ( Figure  4A -stiffness, green dotted line). In the compliant tendon case, the fiber length decreases for increasing activation, changing the operating point of the muscle on the force-length relation, and the tendon length increases for increasing activation (Figure 4 Alength, solid lines). As a consequence, ( Figure  4A stiffness, blue solid line) increases at a superlinear rate with activation level as muscle fiber length decreases with increasing activation level shifting the muscle's operating length to a steeper part of the force-length relation in this specific case. However, differences in muscle stiffness when considering a compliant instead of a rigid tendon are small Figure 4A detail MT stiffness). The compliant tendon ( Figure 4A stiffness, red solid line) is much stiffer than the muscle fibers (at least factor 6), and as a result has a very minor effect on (green solid line). In Figure 4A detail MT stiffness, we see that at high muscle activations (above 0.75), muscle-tendon stiffness becomes higher for the compliant than for the rigid tendon, because the increase in fiber stiffness due to fiber shortening outweighs the effect of the compliant tendon.
In the extended Hill-type model where we added activation dependent stiffness to the muscle fibers modeling short-range-stiffness, the addition of a compliant tendon has a large influence on muscletendon stiffness as the muscle fibers become much stiffer than the tendon due to the added element (compare Figure 4 A vs Figure 4B -stiffness). Note that even with the inclusion of a compliant tendon, modeling short-range stiffness results in about a five-fold increase in stiffness of the muscle-tendon unit at an activation level of 0.5 ( Figure 4A vs Figure 4B, detail MT stiffness). We therefore expect simulations based on the muscle model with short-range-stiffness to change when including a compliant tendon. We speculate that for platform rotations in healthy individuals, postural sway might be lower as their strategy is to move the ankle in anti-phase to the platform which should be facilitated by a more compliant muscle-tendon unit. For vestibular loss subjects that try to follow the platform motion with their body a decrease in muscle-tendon stiffness might lead to lower levels of co-contraction, as it becomes less efficient, and higher postural sway. In the simulations of platform translations muscle cocontraction becomes a less efficient strategy at lower muscle tendon stiffness and would thus decrease for both healthy and vestibular loss subjects. This would then lead to an increase in postural sway for the translations.

Approximate stochastic optimal control
We formulate predictive simulations of human movement as stochastic optimal control problems of the following form: (pathconstraints) (5) (boundaryconditions) (6) with the deterministic controls consisting of the feedforward excitations ( )and the time-varying linear feedback gain matrix ( ), ( ) the stochastic states, ( ) is the stochastic feedback signal, ( ) are the total muscle excitations (combination of feedforward and feedback), is Gaussian system noise (e.g. motor noise), and is Gaussian noise corrupting the policy (e.g. sensory noise). The system and policy noise inject uncertainty into the simulations, resulting in stochastic state variables. Consequently, all variables that depend on the state are stochastic as well. The controls, ( ) and ( ), are deterministic variables.
Although the noise sources are Gaussian, the distribution of the stochastic variables are generally non-Gaussian due to nonlinearities in the dynamics. In the proposed approach, we approximate the state distribution as Gaussian. Consequently, we can describe the distribution of the state trajectory by the mean state trajectory ( ( )) and the state covariance trajectory ( ( )). The dynamics of the mean state trajectory are approximated by the deterministic ( = [ , ] = 0) version of the system dynamics (•). The propagation of the state covariance is approximated using the continuous Lyapunov differential equations [6], [7], which assume local invariance of the system dynamics around the mean trajectory: with ′ the covariance matrix describing Gaussian noise sources. The propagation of the state covariance matrix is as in an Extended Kalman Filter [8].
The propagation of ( ) can be interpreted as the change in uncertainty due to the dynamics ( ( ) ( ) + ( ) ( ) ) and the injection of uncertainty by the noise sources ( ( ) ′ ( )). The dynamics can either act to dissipate uncertainty or to increase uncertainty depending on the system.
Based on the distribution of the stochastic variables, the constraints can be robustified as well. To this aim, the distribution of the constraint equations is approximated based on the assumption of a Gaussian state distribution. Hence, the constraint standard deviation is √ ( ) The constraints are then reformulated as follows: ( ( ), ( ), ( ), ( )) + √ ( ) ≥ 0 with a parameter that defines the chance that the constraint is fulfilled. Due to the use of a Gaussian state distribution, even solutions far away of the mean state have a finite chance of occurring and therefore it will not be possible to impose the constraints for the entire state distribution.

Positive definiteness preserving discretization
The state covariance matrix ( ) is positive-definite by definition. However, numerical integration can destroy this property due to the accumulation of integration errors. Gillis and Diehl noted and addressed this issue [9]; they propose a positive-definiteness preserving discretization of the Lyapunov differential equation as explained below. The mathematical proof can be found in [9], [10].
If direct collocation is applied to transcribe the optimal control problem, the dynamics can be discretized into a set of constraint functions and that describe the numerical integration: with auxiliary variables defining the state at the collocation points within each integration interval and the integration interval. It is shown in [10], [11] that given the discretized dynamics in the above form, the continuous propagation of the state covariance matrix can be discretized into the following form that guarantees the preservation of positive definiteness: For example, in the case of a trapezoidal integration scheme the discretized dynamics (eq. (11) and eq. (12)) of a system with continuous dynamics in the general form ̇= f( , , w) become:

Combination with implicit formulation of system dynamics
Formulating the dynamic equations implicitly, ( , ,, ) = 0, rather than explicitly, ̇= ( , , ) has been shown to improve the efficiency of deterministic optimal control simulations of movement [12]. This approach avoids the inversion of near-singular matrices (e.g. the mass matrix in multibody systems with bodies of different mass, length and inertia), which is numerically an unstable operation and can become an issue for underlying NLP solvers. In practice, when direct collocation is applied and the dynamics are formulated implicitly, one can augment the variable space with 'slack controls' (̇) that represent the state derivatives and impose the system dynamics in an implicit manner: ( , ,̇, ) = 0. When using an implicit formulation and a trapezoidal integration scheme, the discretized dynamics become: ( , ,̇, ) = 0 (19) To formulate the propagation of the state covariance matrix, the sensitivity of the state derivatives with respect to the states and noise sources is required (eq. 8 and eq. 9). When the dynamics are formulated implicitly these expressions are not readily available. To conserve the possibility to formulate the system dynamics implicitly it is required to augment the variable space once again with 'slack controls' that represent the derivatives of the state derivatives with respect to the state (̇) and the noise sources (̇). We determine those through the following (implicit) constraints that follows from taking the partial derivatives to the state x and the noise sources w of eq. 19 and applying the chain rule: Eq. 20 and eq. 21 can then be combined with the positive definiteness preserving Lyapunov discretization. In case of the trapezoidal integration scheme we obtain: = − 2̇, +1 (23) Approximate stochastic optimal control framework Here, we describe the approximate stochastic optimal control problem for the class of problems we are solving in this study. Given the implicit stochastic dynamics: With a helper variable introduced to avoid the inversion of the matrix ( ) . Note further that through linearization we can write: Equation (27) can be derived from equation (1)  Notes on integration schemes When the system dynamics are not known a priori, because the feedback gains that determine the closedloop dynamics are optimization variables, we could end up in a situation where the propagation of the covariance is inaccurate depending on the chosen discretization scheme. Backward and forward Euler integrations schemes are problematic, whereas a trapezoidal integration scheme is more reliable.

Why a forward Euler scheme is problematic for the propagation of covariance
In the case of using a forward Euler discretization scheme, eq. 11 and eq. 12 become: This leads to the following expressions for the positive definiteness preserving Lyapunov discretization scheme: We thus obtain: To demonstrate why a backward Euler scheme might be problematic, we only consider the first in eq. 44 that describes how the covariance matrix changes due to the dynamics. For this example, let us consider that the dynamics describes exponential growth/decay: ̇= • . If < 0 initial variance of the state decreases over time, with a faster decrease of uncertainty with increasing absolute value of . If > 0 initial variance of the state increases over time, with a faster increase of uncertainty with increasing absolute value of .
In this case the discrete covariance propagation equation becomes: 2 (44) We observe the following problems.
(1) The uncertainty can be completely eliminated from the system by choosing = −1/ .
Although a negative alpha does stabilize the dynamics and uncertainty would asymptotically disappear over time, this could result in strong over-estimation of the dissipative features of the dynamics. (2) When ≪ −1/ and thus we have a strongly dissipative system, uncertainty will accumulate strongly. The discrete dynamics do not capture the dissipative feature of the system dynamics in this case. One could argue that must be chose smaller, but if is an optimization variable, it cannot be verified in advance whether is small enough.
If for example the objective is to minimize the uncertainty in the system, the optimization will select = −1/ while an ≪ −1/ would in reality be a better solution.

Why a backward Euler scheme is problematic for the propagation of covariance
In the case of using a backward Euler discretization scheme, eq. 11 and eq. 12 become (appreciate the difference with the forward Euler scheme): We thus obtain: +1 = ( −̇, ) −1 ( + ̇, +1̇, +1 )(( −̇, ) −1 ) (50) Again, we ignore the term due to noise and only consider the dynamics term. Let us consider the same dynamics as in the previous example: ̇= • . In this case we find: In contrast to the forward Euler scheme, there is no finite that would eliminate all uncertainty in a single time step. If we choose < 0, we will dissipate uncertainty from the system as is expected. The larger the time step , the more we will underestimate how much uncertainty is dissipated from the system. With smaller step size we approach the true value. If we choose 1 > • > 0 it is clear that uncertainty will accumulate, which makes sense. However, we observe again two issues: (1) When choosing • = 1 the uncertainty will become infinite at the next time-step. This can become an issue during an iterative optimization process where an iteration might bump into values that are infinite.
(2) When • > 2, the system appears to dissipate uncertainty while in reality it should be accumulating uncertainty.

Trapezoidal scheme to solve these integrator stability issues
In case of a trapezoidal scheme, the same example dynamics ̇= • results in the following discrete covariance propagation equation: The behavior (accumulative/dissipative) of the system integrated by this integrator is correct for all values of .
If ≫ 0 we still have an accumulation of uncertainty as the asymptote lies at 1 for 2 • = + .
If < 0 we have a dissipative system.
There is an asymptote where uncertainty accumulates infinitely quick if 2 • = 1 which can still give us an issue during optimization. However, current gradient-based optimization algorithms for solving nonlinear programs (IPOPT [13], SNOPT [14]) have heuristics implemented that make them relatively robust to such problems.